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Abstract 

The global sensitivity analysis of a complex numerical model often calls for the es- 
timation of variance-based importance measures, named Sobol' indices. Metamodel- 
based techniques have been developed in order to replace the cpu time-expensive 
computer code with an inexpensive mathematical function, which predicts the com- 
puter code output. The common metamodel-based sensitivity analysis methods are 
well-suited for computer codes with scalar outputs. However, in the environmental 
domain, as in many areas of application, the numerical model outputs are often 
spatial maps, which may also vary with time. In this paper, we introduce an inno- 
vative method to obtain a spatial map of Sobol' indices with a minimal number of 
numerical model computations. It is based upon the functional decomposition of the 
spatial output onto a wavelet basis and the metamodeling of the wavelet coefficients 
by the Gaussian process. An analytical example is presented to clarify the various 
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steps of our methodology. This technique is then applied to a real hydrogeological 
case: for each model input variable, a spatial map of Sobol' indices is thus obtained. 

Keywords: Computer experiment, Gaussian process, metamodel, functional data, ra- 
dionuclide migration. 

Short title: Spatial global sensitivity analysis 

1 INTRODUCTION 

Today, in different environments, there are sites with groundwater contaminated because 
of an inappropriate handling or disposal of hazardous materials or waste. Such environ- 
mental or sanitary issues require the development of treatment or remediation strategies 
and, in all robust long-term prediction of behaviour. The indispensable simulation 

of global fluxes, such as water or pollutants, through the different environmental compart- 
ments involves many parameters. Numerical modeling is an efficient tool for an accurate 
prediction of the spreading of the contamination plume and an assessment of environmen- 
tal risks associated to the site. However, it is well known that many input variables, such 
as hydrogeological parameters (permeabilities, porosities, etc.) or boundary and initial 
conditions (contaminant concentrations, aquifer level, etc.), are highly uncertain in the 
complex numerical models. A systematic and exhaustive 3D characterization of sites is 
still impossible. 

To deal with all these uncertainties, computer experiment methodologies based upon 
statistical techniques are useful. For instance, we assume that Y = /(X) is the real-valued 
output of a computer code /. Its input variables are random and modeled by the random 
vector X = (Xi, . . . , Xj) 6 X, X being a bounded domain of M. d , of known distribution. 
The uncertainty analysis step is used to evaluate statistical parame ters, confidence inter- 
vals or the density probability distribution of the model response (jpe Rocquigny et ah . 
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20081 ). while the global sensitivity analysis step is used to quantify the influence of the 
uncertai nties of the model in put variables (in their whole range of variations) on model re- 



sponses (ISaltelli et al. 



20001 ) . Recent studies have applied di fferent statistical methods o: 



uncert ainty and sensitivi t y analysis to enyironmental models ( 



1998 



Fasso et all . 12003 



Volkova et al 



2008 



Helton. 



1993; 



Lilburne and Tarantolal . [2009) . All these 



Nychka et al. 



methods have shown their efficiency in providing guidance to a better understanding of 
the modeling. 

However, for the purpose of sensitivity analysis, four main difficulties can arise due to 
practical problems, especially when focusing on environmental risks: 

PI) physical models involve rather complex phenomena (they are non linear and subject 
to threshold effects) sometimes with strong interactions between physical variables 

P2) computer codes are often too cpu time expensive to evaluate a model response, from 
several minutes to weeks 

P3) numerical models take as inputs a large number of uncertain variables (typically 
d > 10) 

P4) the outputs of these numerical encompass many variables of interest, that can vary 
in space and time. 

The first problem PI is solved using variance-based measures. These o nes can handle 
non- 



inear and non-monotonic relationships between inputs and output (ISaltelli et al. 



20001 ). These measur es are based upon th e functional ANOVA decomposition of any 



integrable function / (jEfron and Stein 



19811 ) and determine how to share of the v ariance 



of th e output resulting from a variable Xi or an interaction between variables (jSobol . 



19931 ): 



Si 



Var [E (Y\Xj)] 
Var(F) 



Si 



Var [EjYlX^Xj)] 
Var(F) 



Si S 



J I 



s, 



ijk 



The interpretation of these coefficients, namely the Sobol' indices, is natural as all indices 
lie in [0, 1] and their sum is one in the case of independant input variables. The larger the 
index value, the greater the importance of the variable related to t h is ind ex. To express 



the overall output sensitivity to an input Xi 
total sensitivity index: 



£>Ti — Si + 2J Sij + 2J Sijk + . . . — 2^ &i — 1 



Homma and Saltellil (I19961 ) introduce the 



i<j 



i<j<k 



le#i 



Var [E (Y\X^)] 
Var(y) 



(2) 



where j^i represents all the "non-ordered" subsets of indices containing index i and X^i is 
the vector of all inputs except Xi. Thus, $i * s the sum of all the sensitivity indices 

with index containing i. 

Unfortunately, the traditional or advanced Monte Carlo methods, which are used to 
estimate first order and total Sobol' indices, require a large number of model evaluations 



( partem et al. 



20101 ). To overcome the problem P2, of too long a calculation time, and 
make uncertainty and sensitivity a nalysis tractable, various approaches based upon met a 



modeling were recent l y pro posed (IKoehler and Owen 



1996 



Oakley and O'Hagan , 



Kleijnen and Sargent 



2000 



20021 ). The key point consists of replacing the complex computer 
code by a mathematical approximation, called a metamodel, which is fitted from only a 
few experiments. The metamodel rep roduces the behav i or of the compute r code in the 



domain of its influential parameters (ISacks et al. 



1989 



Fang et al. 



20061 ). Among all 



the metamodel-based solutions (polynomials, splines, neural networks, etc.), we focus our 
attention on the Gaussian process (Gp) model. It can be vie wed as an extension of th e 



kriging method, which is used for int e rpola t ing data in space (IChiles and Delfiner 



to co mputer code data ( Sacks et al. 



(e.g. 



Welch et al. 



1992 



Marrel et al. 



1989 



Oakley and O'Hagan , 



19991) 



20021 ). Many authors 



20081 ) have shown how the Gp model can be used 



as an efficient emulator of code responses, even in high dimensional cases (problem P3). 
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In this paper, we consider models subject to the four problems together (PI, P2, P3 
and P4), which is an usual cas in model-based environmental studies. We mainly pay 
attention to problem P4, that is the possible high dimension of model outputs. In the 
application case studied in this paper, the costly numerical model yields spatial concentra- 
tion maps. These spatial outputs encompass several thousands of grid blocks, each with a 
concentration value. This kind of problem cannot be tuned to a vectorial output problem 
because of its dimensionality: the metamodel ing of this v e ctoria l output cannot be solved 



20061 ). Therefore, we consider 



referring to kriging or cokriging techniques ( iFang et all 
the model output as a functional output synthesized by its projection on an appropriate 
basis. This problem of building a metamodel (based upon functional decomposition and 
Gp modeli ng) for a f uncti onal output has re c ently been addressed for one- dimensional 



outputs by 


Sh 


i et al. 


Higdon et al. 


(2008) 



fl2007h and 



Bayarri et al. 



( 120071 ) and for two-dimensional outputs by 



In the case of sensitivity analysis, a functional output is usually considered as a vecto- 
rial output and sensi tivity indices relative to ea ch input are computed for each discretized 



value of the output ( IDe Rocquigny et al 



20081 ). To avoid the large amount of sensitivity 



index computations when applying such an approach, a few authors referred to various 



basis decomposit i ons on the functiona 



(ICampbell et al. 



2006 



Lamboni et al. 



outpu t, such as the principal component analysis 



20091 ) . Then, sensitivity indices are obtained for 



the coefficients of the expansion basis. 

However, the full functional restitution of Sobol' indices remains an unexplored chal- 
lenge. In this paper, we propose an original and complete methodology to compute Sobol' 
indices at each location of the spatial output map . Our approach con sists of building a 



Bayarri et al. 



( 20071 ) (restricted to the 



metamodel based upon wavelet decomposition as in 
case of a temporal output). This metamodel is then used to compute spatial Sobol' index 
maps (one map per input variable). A map of the Sobol' index of an input X, shows the 
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local and global influences of this input on the output. It can help to better understand 
the computer code results and can used to reduce more efficiently the uncertainties in the 
responses. Thus, to reduce the output variability at a given point of the map, we analyze 
all Sobol' maps and determine the most influential inputs. Then, we can try to reduce 
the uncertainty of these inputs by accounting for additional measures. In addition, the 
global influence of each input over the whole space can be investigated to identify areas 
of influence and non- influence for this input. 

Details about the Gp metamodel are given in the following section. Then, a step by 
step description of our methodology is given in Section 3. A synthetic test function is used 
to evidence the relevance of our choices and estimate the convergence of the algorithms. 
Section 4 presents how our methodology is applied to a real environmental problem, which 
calls for the modeling of radionuclide groundwater migration (MARTHE code). Then, a 
few points are discussed at the end of this paper. 

2 GAUSSIAN PROCESS METAMODELING 

This section introduces the Gp metamodel for the case of a single scalar output. We 
consider n realizations of a computer code. Each realization y = /(x) G 1 is an output of 
the computer code and corresponds to a d- dimensional input vector x = {x\, . . . , x^) G X. 
The n points corresponding to the code runs are called the experimental design and 
are denoted as X s = (x^, . . . , x( n )). The outputs are denoted as Y s = (y^\...,y^) 
with j/W = y(xW) V i = l..n. Gp modeling treats the deterministic response y(x) as a 
realization of a ran dom function Yop (x). This includes a regression part and a centered 



stochastic process ( packs et al. 



19891 ) . It can be written as: 



y Gp (x) = / (x) + Z(x) . (3) 
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The deterministic function /o( x ) provides the mean approximation of the computer 
code. In our study, we use a one-degree polynomial model with /o( x ) written as: 



/o(x) = /3 + ^/?. 



j x j ) 



w here = \0n, ■ ■ ■ , Pk]* }s the r egre ssion paramet e r vect or. It has been shown, for example 



in 



Martin and Simpson! ( 120051 ) and 



Marrel et al. 



(120081 ) . that such a function is sufficient, 



and sometimes necessary, to capture the global trend of the computer code. 

The stochastic part Z(x) is a Gaussian centered process fully characterized by its 
covariance function: Cov(Z(x), Z(u)) = cr 2 Z?(x, u), where a 2 is the variance of Z and 
R the correlation function. For simplicity, we consider a stationary process Z(x), which 
means that correlation between Z{x) and Z(u) is a function of the distance between x and 
u. Our study focuses on a particular family of correlation functions that can be written 
as a product of one-dimensional correlation functions Ri\ 



Cov(Z(x), Z{u)) = (j 2 i?(x - u) = a 2 Y[ Ri( Xl - 



Ui 



i=i 



This form of correlation function is particularly well-suited to simplify mathemat ical de- 



velopments in analytical uncertainty and sensitivity analyses ( iMarrel et al. 



2009). More 



precisely, we use the generalized exponential correlation function: 



R 6,p( x ~ u ) = JJexp(-^|x; - U;| Pi ), 



i=i 



where 6 = [9\, . . . , 0$ and p = \pi, ■ ■ ■ ,PdY are the correlation parameters (also called 
hyperparameters) with Q\ > and < pi < 2 V I = l..d. This choice is motivated by the 
wide spectrum of shapes that such a function offers. 



If a new point x* 



x 



i- 



, Xj) G X is considered, we obtain the following predictor 



and variance formulas: 



E[y Gp (x*)] = / (x*) + fe(x*)'E7 l (y. - f{x.)) 

Var[r Gp (x*)] = a 2 - fc(x*)*S; 1 fc(x*) 



(4) 
(5) 



with Y Gp denoting (Y\Y S , X S1 0, a, 6,p), 



fe(x*) = [Cov(yW,y(x*)),...,Cov(y( n ) ! y(x*))]* 

= ^[%(x (1) -x*),..,^(xW-x'))r 



and the covariance matrix 



R e, P ( ; 



,0") 



) i=l..n,j=l..n) 



Regression and correlati on parameters 0, a, 6 and p are usually estimated by maxi- 



mizing likelihood functions (jFang et al. 



20061 ). This optimization pr oblem can be 



conditioned and difficu 
and 



Marrel et al. 



t to solve in high dimensional cases (d > 5). 1 Welch et al 



badly 



(ll992|) 



(120081 ) developed algorithms to build Gp metamodels on outputs that 
have a non-linearity depending on quite a large number of input variables. 

The conditional mean (Eq. ((H)) is used as a predictor. The variance formula (Eq. 
(jSJ)) corresponds to the mean squared error (MSE) of this predictor and is also known 
as the kriging variance. This analytical formula for MSE gives a local indicator of the 
prediction accuracy. More generally, the Gp model provides an analytical formula for 
the distribution of the output variable at any arbitrary new point. This distribution for- 



mula can be used to develop ana 



l r 



(jOakley and O'Haganl. 



plication, 



Marrel et al. 



2002 



ytical formula for uncertainty and sensitivity analyses 



20041 ) . Studying several test functions and one industrial ap- 



(]2009i ) showed that this analytical approach is efficient to compute 
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the first order Sobol' indices Si (Eq. (flj)). In addition, it provides confidence intervals for 
the estimates. However, the analytical approach does not yield any direct estimation of 
the total Sobol' indices (Eq. (j2j)) and deals only with uncorrected inputs. 



3 METHODOLOGY FOR A SPATIAL OUPUT 



In this section, we descr ibe the metho dology that we use to compute spatial Sobol' index 



maps (first proposed in 



Marrel . 



20081 ). We also apply this methodology to an analytical 



function in order to study the convergence of the algorithms. 



3.1 General principles 

For a given x* value of vector X = (Xi, . . . , Xj), the code output is now a deterministic 
function y(x*,z) where z denotes a vector of dimension p of spatial coordinates. In this 
paper, we focus on two-dimensional cases. Thus, the target outputs are two-dimensional 
maps. Thus, z varies in a grid on a compact set D z of 1R 2 and corresponds to an index 
for the outputs. Variables X and z are of very distinct natures: variables Xi, . . . ,JQ 
which correspond to the inputs of the computer code, are random. They are different 
for each simulation of the code. Our objective is to perform a sensitivity analysis with 
respect to these variables. Variables z are deterministic and vary on a grid of size n z 
which corresponds to a discretization of D z . The grid is the same for each simulation of 
the code and the output corresponds to the n z values y(x*,z) for z describing the grid. 
For example, the MARTHE model described in Section H] has d = 20 input variables and 
yields at each simulation a map with n z = 64 x 64 = 4096 points. 

Because of the different natures of variables X and z, the dependency of the output 
with respect to these two variables is represented from two different ways. For a fixed 
value of X, we use a projection of map z i— > V(X, z), onto an orthonormal wavelet basis. 



9 



The coefficients of the projection depend on X. We select the coefficients with the largest 
variance and model these coefficients with respect to the d- dimensional input variable X. 
In most applications, the dimension of X is quite large and each simulation of the code 
is time-expensive. Therefore, we need a method able to deal with a limited number of 
simulations and imput vectors of large dimension. In addition, the relationship between 
the input variables and the coefficients is expected to be highly non-linear. We therefore 
use the Gp metamodel, described in the previous section, to model the dependency of 
each selected coefficient with respect to X. 

Therefore, for a given input design X s = (xW) ^ and the n corresponding simu- 
lations of the map (y{yS % \ zj), j = 1, . . . ,n z ), i = l,...,n, the three main steps of the 
method are: 

1. Decomposition of the maps z h-> y(xW, z) onto a two-dimensional wavelet basis 

2. Selection of the coefficients with the largest variance 

3. Modeling of the coefficients with respect to the input variables using a Gp. 

At each step, we use various criteria to evaluate the performance of our procedures. We 
are then able to predict a map (y(x*, Zj), j = 1 • • • , n z ) for a new value of the input vector 
x*. Of course, this method of map prediction (which we call a functional metamodel or 
also, in our case, a spatial metamodel) has the advantage, compared to the simulation of 
the code, to be much less time-expensive. 

Finally, our functional metamodel allows us to produce maps of sensitivity analysis 
based upon Sobol' indices by using Monte Carlo methods (see introduction, Eqs. §B) and 
02])). As mentioned previously, the direct use of the computer code is impossible because 
of the required number of function evaluations. This study is restricted to the estimation 
of the first order indices Si(z) and total Sobol' indices StX z ) f° r z e D z . These two 
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indices allow us to quantify the individual and total influence for each input. Then, the 
degree of interaction with other inputs can then be deduced. 



3.2 An analytical test case: The Campbell2D function 



he a nalytical function used in this section to perform various tests is inspired by lCampbell et al. 



( 120061 ) who considered a function with four inputs and a one-dimensional output. It 
was converted to a function with eight inputs (d = 8) and a two-dimensional output 

(z = (z 1 ,z 2 )): 



Y = g(X,z l7 z 2 ) = X 1 exp 
+X 5 (A 3 -2)exp 



(0.8zi + 0.2z 2 - 10X 2 ) 
QOXf 

(0.4^1 + 0.622 - 20X 6 ) 2 ' 

iaxf 



+ (X 2 + X A ) exp 
+ (X 6 + X 8 ) exp 



(0.5z 1 + 0.5z 2 )X 1 
500 

fO.3^1 + 0.7z 2 )X r 



250 



(6) 



where (zi,z 2 ) E [— 90, 90] 2 represent azimuthal and polar spatial coordinates and Xj ~ 
U[— 1,5] for i = 1 . . . 8. This function, called the Campbell2D function, gives a spatial 
map as output (Figure [T|). The Campbell2D function has been calibrated in order to 
give strong spatial heterogeneities, sometimes with sharp boundaries, and very different 
spatial distributions of the output values according to the X values. 

For the Campbell2D function, it is possible to calculate the first order Sobol' indices 
Si(z\,z 2 ). Appendix A gives the results of these integrations. The resulting analytical 
expressions (Eqs. ( IT6|) to ( 123]) ) provide the exact solutions of the first order Sobol' indices. 
However, analytical calculations of the total Sobol' indices St^Zi, z 2 ) (Eq. ([2])) are not 
possible . We e stimate StX z 1i z 2)i i 8, by using Saltelli's Monte Carlo algorithm 



( iSaltellil . 



2002|) with N = 10 5 . Thus, the Campbell2D function was computed N(d + 2) = 
10 6 times. The estimated errors with such large sample sizes are of the order of 5 x 10~ 3 
(standard deviation estimated via bootstrap). These estimates StX z Ii z< i) are henceforth 
called exact total Sobol' indices. 



11 



-50 50 -50 50 -50 50 

a 6 8 

Figure 1: Three different output maps from the Campbell2D function: x = 
(-1,-1,-1,-1,-1,-1,-1,-1) (left); x = (5,5,5,5,5,5,5,5) (center); x = 
(5,3,1,-1,5,3,1,-1) (right). 

Figure [2] gives the maps of the total Sobol' index estimations. Input X 5 has no influence 
on the output of the Campbell2D function. Input X± has a small influence on the output 
of the Campbell2D function. Input X% has a mild influence in a diagonal axis of the 
spatial domain. Inputs and X$ have mild influences in a large part of the spatial 
domain. Inputs X 2 , X e and X 7 have strong influences in different parts of the spatial 
domain (located in corners for Xi and X 7 ). Moreover, the first order Sobol' indices (maps 
not shown here) for X 3 , X 6 and X 7 are far from the total Sobol' indices. As shown 
by formula ([6]), these four variables have some strong interactions (interactions between 
X 3 , X 5 and X e and between Xq and X 7 ). 

3.3 Spatial metamodeling 

The spatial metamodeling process is composed of 5 internal steps. 
Step - Preparation of the learning sample 

When dealing with a large input dimension d, the choice of the input design X s = 
is very important, especially when n is small. For scalar computer model 
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Figure 2: Total Sobol' indices of the 8 input variables of the Campbell2D function, 
mated by Monte Carlo algorithm. The color scales are the same for all the plots. 



esti- 



output, numerous au thors stressed the s t rong i nfluence of t 



of the Gp modeling (IKoehler and Owen 



1996 



Fang et ai- 



re inp ut design on the quality 



20061 ) . For instance, maximin 



Latin hypercube sam ples and low-discrepancy 



provide good results (jMarrel 



2008 



Iooss et al. 



atin hypercube samples were shown to 



20101 ) . However, building good input de- 



signs for functional output still remains an open question which could be the subject of 
future work. 

For our tests with the Campbell2D function, we use maximin Latin hypercube samples. 
Once the input design is defined, we obtain n simulations of the map (j/(xW,z)) n by 
running the numerical model. 

Step 1 - Spatial decomposition and selection of coefficients 



The spatial decomposition of the output map is made on a basis of orthogonal functions 
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3 J j&i* ' 



oo „ 

y(X,z) = M (z)+y)a J -(X)0 i (z)witha J -(X)= / [Y(X,z) - ^(z)]&(z)dz , (7) 

where /i(z) = Ex|X(X, z)]. We define Ysr(X,z) as the truncated decomposition at order 
K: 



K 



^(X,z)=Mz) + ^a,(X)^(z). (8) 

3=1 

For the function ba sis, various wavelet bases can be considered (Haar, Daubechies, Symm- 



let and Coiflet, see 



Misiti et al. 



20071 ) in order to optimize the compression of the local 
and global information. In the following tests, we use the Daubechies basis which offered 
the best results. 

The selection of a small number k of coefficients Oj(X) to be modeled with Gp is 
essential. For instance, MARTHE maps (see Section H]) and Campbell2D maps contain 
n z = 64 x 64 = 4096 pixels, which leads to K — 4096 wavelet coefficients. Modeling 
such a number o f Gp seems in t racta ble because the building process of one Gp is CPU 



time consuming ( IMarrel et al. 



20081 ). It is therefore necessary to model with Gp only 
the most informative coefficients. The criterion considered for selecting the coefficients 
involves their variance with respect to X: priority is given to coefficients which explain 
at most the output map variability. Mathematically, the new order of the coefficients 
{cki, . . . , ax} is written {«(!), . . . , ct(K)} following the inequalities 



1 n 1 71 

i=i i=i 



«(a-)(x w ) - a^ K) j with aj 



1 n 

i=l 



MY 



(9) 



The number k of Gp-modeled coefficients will be discussed in Step 3. 



Step 2 - Modeling the coefficients 
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For j = 1, . . . , K, the model A/(X) used for approximating the coefficient <x, (X) is one of 
the following models listed below: 

1 n 

• Model 1: the empirical mean: A/(X) = — a j (x^); 

• Model 2: the linear regression model: 



i=l 



A j (X) = p j + Y l l3ijX l 



(10) 



fitted on the learning sample (xW,a 3 -(xW))._ 1 . We use an AIC selection process 
to keep only the significant terms in (TTUT) : 



Model 3: the Gp model of form ([3]) as described in iMarrel et al.l (120081 ). The deter- 
ministic part /o(X) is a linear regression model as in ffTO]) with a selection process 
bas ed on AICC (a modifi ed AIC in order to take spatial correlations into account, 



sec 



Hoeting et al. 



(120061 ) ). The generalized exponential function is used for the 



correlation function R(-) of the stochastic part Z{X). The building of this model 
is rather costly, especially in a high dimensio nal context ( d > 10 ) because of the 



specific variable selection process proposed by 



Marrel et al 



fcooj). 



In the following two steps, we compare three different methodologies in order to stress 
the benefit of an appropriate metamodel choice: 

• Method 1: Model 3 for the k selected coefficients and model 1 for the other coeffi- 
cients 

• Method 2: Model 2 for the k selected coefficients and model 1 for the other coeffi- 
cients 

• Method 3: Model 3 for the k selected coefficients, model 2 for the k! following 
coefficients (k' 3> k) and model 1 for the K — k — kl other coefficients. For the 
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Campbell2D function, setting k! to 500 is a heuristic choice based upon the obser- 
vation that, in the case studied, the information in terms of variability is explained 
by 10% of coefficients. More generally, a convergence study can be made in order 
to find a suitable value for k' . 

We now define Yr-^X, z) the approximation of Ik (X, z) (Eq. flH])) using one of the three 
previous methods. 

Several adequacy criteria can be used to measure the discrepancy between the function 
y(X, z) and its approximation Yr-^X, z). We use the mean absolute error, the maximal 
error and the mean squared error but restrict our presentation to mean squared error 
results for the sake of consistency. The mean squared error MSE(X) is written 

MSE(X) = / |y(X,z) - Y K)k (X,z)] 2 dz . (11) 
Jd z l - 1 

MSE(X) is estimated by integrating over the n z grid. For a fixed value of X, this criterion 
measures the restitution quality in the mean of the overall map. We denote by MSE 
the expectation (with respect to the variable X) of MSE(X). When it is possible, we 
provide new simulations of the map Y(X, z) for randomized values of X, and we use 
this test sample to estimate the MSE. For some applications, this is not possible and 
cross-validation methods can be used to estimate the MSE (see Section H]). 



Y(X,z)-lWX,z 



1 2 



over X 



The MSE can also be obtained by first integrating 
and then by taking the expectation with respect to z. From the MSE, we also define the 
predictivity coefficient Q2 which gives us the percentage of the mean explained variance 
of the output map: 

g2 = 1 ~E z {Var x [Y(X,z)]} ' {U) 

The variance is taken with respect to X because we are interested in the variability 
induced by the model input vector X. Q2 corresponds to the coefficient of determination 
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R 2 computed in prediction (on a test sample or by cross-validation). 
Step 3 - Choosing k*, an optimal value for k 

We perform simulations using the Campbell2D function and study convergence of MSE 
(Eq. (fill) ) as function of k. Our goal is to compare the three methods proposed in step 2, 
then to heuristically find an optimal value k* for k. Indeed, there is a trade-off between 
keeping k small and minimizing the MSE. The MSE is computed using a test sample of 
1000 independent Monte Carlo simulations, giving 1000 output maps. 

Figure [3] gives the MSE results as function of k for different values of the learning 
sample size n. For each method, the MSE curves regularly turn downward as n increases. 
As expected, method 3, which is the richest in terms of model complexity, gives the best 
results, especially for small values of k. The usefulness of Gp is proved as we see that 
method 2 performs badly. It is certainly caused by the behavior of the first selected 
coefficients, which offer strong and non-linear variations: linear models are irrelevant for 
modeling these coefficients. For each method, the convergence is reached for k around 20 
- 25. We decided to fix the optimal value at k* = 30, which is a reasonable number of Gp 
models to be built. 

In real applications, this methodology for choosing k* can be applied even if the 
learning sample size n is limited. For a fixed n, we look for a stabilization of the MSE. If 
this convergence is not reached, we use a predefined maximal value for k. 

It should be noted that if new model runs are available, the analyst has to repeat the 
process to choose k*. However, in order to gain some analysis time, we can leave unchanged 
the ordering of coefficients, which has been obtained with the first set of simulations. In 
addition, we can just update the predictor (Eq. (jlj) by keeping the initial estimation of 
the correlation parameters (which is the most cpu time consuming step). Such choices 
have to be made with care. 
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MSE for n=30, 40, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 300, 400, 500 
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Figure 3: For the Campbell2D function, MSE convergence (as function 
of k) for the three methods and for various learning sample sizes (n = 
30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 300, 400, 500). 

Step 4 - Convergence as function of the learning sample size n 

Finally, it is important to study the convergence of the adequacy criteria as function of 
the learning sample size n. It would allow us to eventually prescribe the need to make new 
simulations with the code. For the Campbell2D function, Figure H] gives the MSE results 
as function of n for different values of k. For each method, the MSE curves regularly turn 
downward as k increases. In real applications, one can restrict this to the visualization of 
the k* curves. 

Method 2 performs badly and the stabilization of its curves is obtained earlier. Indeed, 
adding simulations does not improve the linear models fitted on the k coefficients. For 
methods 1 and 3, the curve stabilization is not reached at n = 500. MSE would decrease 
for larger values of n, but this decrease becomes slower from n = 200 and MSE results are 
rather satisfactory for this value n = 200. In terms of predictivity coefficient (Eq. ffl~2l)). 
we obtain Q 2 = 96.6% for n = 500 and Q2 = 92.9% for n = 200. For methods 1 and 3, 
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increasing k and n leads to a systematic decrease of the MSE. It can therefore be argued 
that MSE tends to zero and that our methodology converges. 
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Figure 4: For the Campbell2D function, MSE convergence (as function of n) for the three 
methods and for various numbers k of Gp-modeled coefficients. 

In real applications, if no additional simulation can be made, this step can be optional. 
However, in the opposite case, these curves would help us to decide if our simulation num- 
ber is sufficient and which method we have to choose. Moreover, knowing that method 3 
can be costly, we can decide to choose method 1 if their MSEs are similar. In practical 
terms, we start from an initial no (random selection of no simulations among the n simu- 
lations) and randomly add simulations until n. The choice of a low-discrepancy sequence 
would also allow the space-filling properties of the design to be kept while increasing n. 

In conclusion, by analyzing all these convergence plots, we choose in the next section 
to use a learning sample size n = 200 and to model k* = 30 Gps using method 3 in order 
to compute Sobol' indices. 

Coarse estimation of the computational time of the different steps 
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Concerning the computational time needed to carry out all our methodology, the most 
costly steps are the construction of each Gp metamodel for the k* wavelet coefficients and 
the validation step (i.e. computation of MSE or Q 2 by cross validation). All the other 
steps such as the wavelet decomposition, the selection of coefficients or the prediction of 
the functional metamodel for any new input value are negligible in terms of computational 
time. 

So, the first main difficulty is the hyperparameter estimation of the k* Gp metamodels. 
Indeed, each computation of the likelihood requires the inversion of correlation matrix and 
consequently, the maximum likelihood estimation can be CPU time consuming. In the 
case of 10 inputs for example and a few hundreds of simulations, a Gp modeling usually 
requires several minutes on a standard PC (Pentium 4, 1.8GHz). So, for tens of coefficients 
to be modeled, the step 2 can take one hour. 

The second difficulty is the validation step, also because of the time required by 
the maximum likelihood estimation. To reduce its computational cost, a k-fold cross- 
validation is preferable in practice and limits the time required for cross validation to just 
a few hours. Another solution is to leave unchanged the hyperparameters of Gp at each 
loop of cross validation. Only the Gp predictor is updated. The cross validation is then a 
little biased but, for a few hundreds of simulations, this bias becomes quickly negligible. 

As a conclusion, only the step of the Gp modeling is computationally expensive. For 
instance, in the Campbell2D function study, with d — 8 inputs, n z = 4096 pixels, k 
ranging from 10 to 50 Gp models and n = 200 simulations, the metamodeling process 
from steps 1 to 3 (without the convergence plot in function of n) required approximately 
one day. For the MARTHE test case, with d = 20, n z = 4096, n = 300, k = 100 and with 
a 10-fold cross-validation process, the computation of all the Sobol' indices has required 
approximately two days. These operational cost may appear to be high but this process 
is only made once to obtain a full functional metamodel. Afterwards, any evaluation of 
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the metamodel will require a negligible computational time compared to a simulation of 
the initial MARTHE simulator. 



3.4 Global sensitivity analysis 

At this stage, we have a functional metamodel allowing us to predict new output concen- 
tration maps for any new set of input variables. This metamodel has been obtained with 
only n = 200 computations with the Campbell2D function. To estimate Sobol' indices 
of the overall output map of the Campbell2D function, we then perform thousands of 
simulations on our functional metamodel. This method is called hereafter the functional 
metamodel-based approach. 

Note that there is no direct link between the Sobol' indices for the wavelet coefficients 
and the Sobol' indices for the output. Indeed from (JHJ), we have 

K 

y Jf (X,z) = ^)+^a i (X)0 i (z), 

where {aij@Q)i<j<K denote the wavelet coefficients. The sensitivity map with respect to 

the variable Xi is SAz) = — ' — — — . Hence, 

Var[Y^(X,z)J 



Zf^CoyiEia^lXlEia^lX^iz^) 
Var[Y^(X,z)] 



SAz) 



If the functions {<l>j{z))i<j<K have disjoint supports, all the terms with I =fi j in the 
above formula equal zero and Sobol' indices for the wavelet coefficients could be used to 
compute Sobol' indices for the output. In this paper, this is not the case as we use the 
Daubechies basis for these functions. This basis gave much better results than bases with 
disjoint supports functions (such as the Haar basis). 

Thus, to estimate Sobol' indices of the overall output map of the Campbell2D function, 
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we perform thousands of simulations on our functional metamodel. Because of constraints 
of memory allocation (due to the size of the output map and our vectori al programmin g 



2OO2I). 



constraints), it is not possible to use Saltelli's Monte Carlo algorithm (jSaltelli . 
Therefore, we use the following procedure for each of the n z nodes of the grid: 

• For the variance of the conditional expectation of each input variable Xi (i = 
1, . . . , 8), we perform 1000 Monte Carlo computations to estimate E(Y|Xj) (integra- 
tion over 7 dimensions) and 200 Monte Carlo computations to estimate Var[E(Y|JQ)] 
(integration over one dimension). 

• For the variance of the conditional expectation of each X^i (i = 1, . . . , 8), we perform 
100 Monte Carlo computations to estimate E(F|X^j) (integration over one dimen- 
sion) and 1000 Monte Carlo computations to estimate Var[E(Y|X^j)] (integration 
over 7 dimensions). 

• The variance of the output Var(Y) is obtained using 2 x 10 4 simulations (integration 
over 8 dimensions). 

• Thus, the first order Sobol' index estimates (noticed S° p ) are obtained from Eq. ([I]) 
and the total Sobol' index estimates (noticed S^) are obtained from Eq. (J2j). 

Finally, we obtain the Sobol' indices Sf p (z) and S^ p (z) for all the n z grid points. 

Figure [5] shows the Sobol' index maps for X2 and Xq, which are the most influential 
input variables in the Campbell2D function (see Fig. [2]). Results for X 2 are partic- 
ularly convincing: first order and total sensitivity values obtained with the functional 
metamodel-based approach are accurate everywhere in the spatial domain D z . Results 
for Xq are fairly good for the first order Sobol' index and less precise for the total Sobol' 
index. However, the spatial influence zone of Xq in the upper left corner is well retrieved 
by the functional metamodel-based approach. In fact, X2 corresponds to a solely influen- 
tial input variable while X e has significant interactions with other input variables (mainly 
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with X 3 ). Therefore, because of a more difficult Gp fitting process, the Gp models of the 
wavelet coefficients of X 6 are less precise than the Gp models of the wavelet coefficients 
of Xi- However, we argue that the important information is present in the spatial SoboF 
map of SS p (z). 
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Figure 5: For the Campbell2D function and variables X2 and Xq, comparison between 
exact first order and total SoboF indices (top) and functional metamodel-based SoboF 
indices (bottom). The color scales are the same for all the plots. 



For all the input variables, the relative mean absolute errors of the first order SoboF 
indices, 



rMAE(Si) 



E z |Sf p (z) - Si{z) 



(13) 



K[Si(z)} 

were estimated for i = 1, ... ,8 (see Table [1]). The results of Table [1] show that the 
estimations of the sensitivity maps for X 2 and X$ correspond to one of the most difficult 
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Table 1: For the Campbell2D function, relative mean absolute errors (in percent) of the 
first order sensitivity indices estimated via the functional metamodel-based approach. 
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cases. Figure [5] shows that a mean absolute error of a 15 %-order is quite satisfactory in 
terms of sensitivity maps. Therefore, all the results for the other input variables show that 
our functional metamodel-based approach gives precise results. Note that the rMAE value 
for X 5 is not given because S 5 (zi,z 2 ) = W(z 1 ,z 2 ) G [— 90, 90] 2 , and the denominator in 
Eq (fTBl is equal to zero. 

In conclusion, we have shown the efficiency of this new spatial global sensitivity anal- 
ysis method for this analytical and relatively complex test function: all sensitivity index 
spatial maps have been obtained using only n = 200 computations of the Campbell2D 
function. 

4 APPLICATION 

4.1 The environmental problem 

In the period between 1943 and 1974 radioactive waste was buried in eleven temporary 
repositories built on a specially allocated site at the RRC Kurchatov Institute (KI) in the 
Moscow area (Russia). The site used for radioactive waste interim storage covers an area of 
about 2 hectares and is situated near the KI external perimeter in the immediate vicinity of 
the city's residential area. A radioactive survey of the site and its adjacent area performed 
in the late 1980s - early 1990s and in 2002 showed that radioactive contamination is not 
only present on the surface but has a tendency to spread into the groundwater. The 
porous media of the site is represented principally by sands alternatively with clays that 
form several horizontal superposed aquifers. To analyze radioactive contamination of 
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groundwater, about a hundred exploration wells were drilled on the site. As a result of the 
survey, it was discovered that contamination of groundwater concerns mainly connected 
to 90 Sr. Since the radiation survey results have demonstrated the necessity to clean up the 
site, rehabilitation activities on radwaste removal and liquidation of old repositories were 
performed at the site between 2002 and 2006. A network of observation wells is used to 
control groundwater conditions of the two upper aquifers. This network consists of twenty 
observation wells for the upper moraine aquifer and nine for the second Jurassic aquifer. 
It is used for a regular recording of groundwater levels, its chemical and radionuclide 



composition (see 



Velikhov et al 



20071 ). 



A numerical model of 90 Sr transport in gr oundwater was 



chatov Institute (KI) radwaste disposal site (jVolkova et al.l . 120081 ). It aimed to provide a 



devel oped for the RRC Kur- 



correct prediction of further contamination plume spreading since 2002 (using an interpo- 
lated concentration map) and up to the end of the year 2010, to show the risks associated 
with contamination and to serve as a basis for engineering decision-making. The numerical 
model was constructed using the MARTHE hydrogeological program package (developed 
by BRGM, the French Geological Survey). It is a three-dimensional combined transient 
flow and transport convection-dispersion model taking into account sorption and radioac- 
tive decay. Three layers were singled out; horizontal, vertical and temporal meshes were 
chosen in accordance with the migration characteristics of the sand. Initial concentration 
plume in 2002 and spreading prediction made for the year 2010 are shown in Figure |6j 
As can be seen, contamination plume predicted for the year 2010 is not uniform and is 
more diffused than the initial one. This is due, above all, to the influence of intensive 
infiltration assigned to several zones of the model domain that results in local dispersion 
of the contamination plume. 



It has been shown in 



Volkova et al 



(120081 ) that the shape of the predicted contami- 



nation plume depends on the model input values (hydraulic conductivity, infiltration pa- 
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Figure 6: Initial (left, 2002) and predicted (right, 20 10) 90 Sr concentrations (hot colors 
represent higher levels of concentration). Initial concentrations range from to 12 Bq/1 
while final concentrations range from to 8 Bq/1. The small white rectangles represent 
the location of the observation wells. 

rameters, sorption distribution coefficients, etc.). Indeed, a large part of the model input 
variables are exposed to some uncertainty, since their values have been obtained through 
expert judgment, model calibration, field experiments and laboratory experiments. These 
uncertainties lead to uncertainties in model prediction. In order to evaluate the degree 
of input influence on the resulting contamination plume shape and concentration values 
predicted in observation wells, it was proposed to perform global sensitivity analysis on 
this numerical model (called MARTHE in the following sections). 



4.2 Global sensitivity analysis on scalar outputs 

From expert judgment and laboratory experiments, probability distributions (uniform and 
Weibull laws) were assigned to 20 random input variables of MARTHE. 300 Monte Carlo 



simul ations, based upon Latin hypercube sampling of the input variables ( iMcKay et al. 



19791 ). were performed (requiring four calculation days). For each simulated set of input 
variables, MARTHE computes transport equations of 90 Sr and predicts the evolution of 
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90 Sr concentration. The 20 uncertain model parameters are the permeability of different 
geological layers composing the simulated field, longitudinal and transverse dispersivity 
coefficients, and sorption distribution coefficients. To perform global sensitivity analysis 
and in particular to compute Sobol' indices, previous studies have concentrated on 20 
scalar outputs of 90 Sr concentration values, predicted for the year 2010, in 20 piezometers 
located on the waste repository site. 

Because of the long computing tim e of MARTHE and o f the non-linearity of the rela- 



tionships between inputs and outputs, 



Volkova et al. 



(120081 ) proposed to fit a metamodel 



(based upon the boosting of regression trees) on each output using the learning sample 
(300 observations). The boosting trees method consists of a sequential construction of 
weak models (here regression trees with low interaction depth), that are then aggregated. 
This leads to a relatively efficient metamodel (but difficult to interpret). Then Sobol' 
indices were compu ted by intensive Monte Carlo simulations using this metamodel. In 



Marrel et al. 



(120081 ) . each output was modeled by a Gp metamodel. The Gp metamodel 
outperforms the linear regression and the boosting regression trees metamodel in terms 
of predictivity of the output values. 

As a result of these sensitivity analyses, we note that the calculated concentrations at 
the piezometric locations are mainly influenced by the distribution coefficient of 90 Sr in the 
first and second layers of the domain and by the intensity infiltration in the pipe leakage 
zones, and to a lesser extent by the hydrodynamic parameters (dispersivity, porosity, etc.). 
However, we are aware that spatial information has been lost in these analyses, due to the 
limited amount of output values that we have considered (concentrations located at 20 
locations). Our goal was then to compute Sobol' indices in the whole spatial concentration 
map, predicted by the model for 2010. 
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4.3 Global sensitivity analysis on the output concentration map 

The methodology presented in the previous section was then applied to MARTHE. Re- 
member that this model contains d = 20 input random variables and that the n = 300 
simulations have been performed following a Latin hypercube sample in a previous work. 
In previous studies, 20 scalar output variables had been considered and we hoped to ob- 
tain more information by using all the spatial information contained in the maps. We 
used the 300 spatial output maps, discretized in n z = 4096 pixels and predicting the 90 Sr 
concentration values in 2010. 

Figure [7] (a) and (b) shows two output maps and exemplifies the potential variability 
between the maps and their contour irregularity. Another output map (Figure |6l right) 
confirms this observation. The variance of the 300 maps (Figure [7] (c)) allows us to 
illuminate the strong- variability zones (central spot), the mild- variability zones (on the 
left and at the top of the central spot) and the zones with no variability where the 
concentration values are equal to zero (the major part of the maps). All this corroborates 
the need for a non-trivial functional metamodel, such as our wavelet-Gp based metamodel 
decribed in Section 13.31 




Figure 7: (a) and (b): Two final concentration maps of MARTHE (units in Bq/1). (c): 
Variance of the 300 concentration maps (colors are in logarithmic scales, ranging from 
to 10). 
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As step was already done, we applied the remaining steps of the spatial global 
sensitivity analysis methodology (see Section [3]) using our learning sample of size n = 
300. From steps 2 and 3, we retained method 3 with the choice of k* = 100 modeled 
coefficients with Gp: the stabilization of MSE was observed for this value of k*. The 
number of coefficients modeled with linear models is k' = 900. Step 4 was not applied to 
this ap plication case. Indee d, MARTHE simulations have been performed in a previous 



study (jVolkova et al. 



20081 ) and the computer code is no longer available. Therefore, no 
additional point could be added and step 4 would be useless. 

In the MARTHE application, no test basis was available to compute the MSE in 
prediction. The MSE estimate was obtained via a 10-fold cross-validation technique. The 
learning sample was randomly divided into 10 sub-samples. Then, we iterated 10 times 
the following process: learning the functional metamodel on 9 sub-samples and estimating 
the MSE on the remaining sub-sample. Our final MSE estimate is the mean of the 10 
obtained MSE values: MSE= 0.039. In terms of predictivit y coeffi c ient ( Eq. (TT2l). we 



Marrell ( 120081 ) 



obtain Q 2 = 72.1%. All the details of this study are given in 

At present, the functional metamodel can be used to estimate first order and total 
Sobol' indices. We used Saltelli's Monte Carlo algorithm (as for the total Sobol' indices in 
Section I3~2|) with iV = 10 3 . Indeed, the low computational cost of our metamodel makes it 
possible to carry thousands of simulations, but not billions because of memory allocation 
problems (see Section |3T4"|) . The final computation cost of Saltelli's algorithm is N(d + 2), 
which leads to a number of 22000 metamodel-based simulations in our case. As a final 
result, we obtain 20 maps of first order Sobol' indices and 20 maps of total Sobol' indices 
(two maps for each input). 

Figure M (a), (b) and (c) shows three maps of total Sobol' indices corresponding to 
the three main influential variables. The 17 remaining input variables have no influence 
in any zone of the spatial output domain. These results are completely coherent with pre- 
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vious studies which have detected the predominant influence of these three variables. Our 
new results have provided some additional spatial information. For example, we locate 
more precisely the influence zones of the distribution coefficient of the first hydrogeological 
layer. Such information is precious for model engineers. It could help them to determine 
according to the spatial location of large variability zones the kind of additional informa- 
tion which is needed. Subsequent decisions could be to place new piezometers in specific 
geographical zones. The methodological developments highlight not only the direct appli- 
cation to post-treatment processes but also enable us to propose a new characterization 
strategy. 

Figure [8] (d) gives spatial information about the MARTHE model. It clarifies the 
obvious correlation between the MARTHE hydrogeological scenario and our obtained 
spatial maps of sensitivity indices: influential kdl zones correspond to the absence of 
the second hydrogeological layer while influential kd2 zones correspond to its presence. 
In Figure |8] (c), we also retrieve the high infiltration lines of Figure |8] (d) and see their 
spatial area of influence. 

In our radioactive waste problem, the Sobol' maps of each uncertain input parameter 
clearly provide guidance to a better understanding of the simulator forecast and can be 
used to reduce the response uncertainties most efficiently. For example, if we want to 
reduce the predicted concentration uncertainty at a specific point of the map, we analyze 
all the Sobol' maps and determine the most influential inputs at this point. Then, we can 
try to reduce the uncertainty of these inputs by additional measures. Moreover, spatial 
maps for sensitivity indices can reveal gradient of influence of uncertain parameters, linked 
to the physics of the phenomenon (e.g. influence of a parameter varying in function of the 
flow direction). The global influence of each input over the whole space can also be used 
to identify areas of influence and areas of non-influence of this input and can be linked, 
as for kdl and kd2, to a map of a geological parameter. If we now consider the strong 
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Figure 8: Total Sobol' indices of three input variables of MARTHE: (a) kdl (distribution 
coefficient of the first layer), (b) kd2 (distribution coefficient of the second layer) and (c) 
i3 (high infiltration rate), (d): MARTHE hydrogeological model: blue zones (numbered 
from 1 to 4) correspond to low conductivity zones (absence of coarse sand in the second 
layer); lines present zones of high infiltration rates. 



infiltration coefficient denoted as i3 and its sensitivity map, we can deduce that i3 is only 
influential around the pipe and its influence is very limited outside the pipe area. The lack 
of knowledge on this parameter does not induce a big uncertainty on the concentration 
forecast at the site boundary and consequently on the decision relative to the need of a 
site rehabilitation. 
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5 CONCLUSION 



In this paper, a new methodology was introduced to compute spatial maps of variance- 
based sensitivity indices (such as the Sobol' indices) for numerical models giving spatial 
maps as outputs. Such situations often occur in environmental modeling problems. One 
critical issue with our method is due to the reduced number of model output maps available 
because of the high cpu time cost of the numerical model. A functional basis decompo- 
sition (wavelet basis) linked to a metamodel technique (based upon the Gp model) is 
proposed and used to solve this problem. Choosing a wavelet basis is well-suited for our 
application cases (analytical and real models) because strong spatial heterogeneities and 
sharp boundaries are observed in the model output maps. In addition, the Gp model is ap- 
propriate for handling the large differences between the output maps obtained for various 
inputs. This induces strong non-linear variations in the Gp-modeled wavelet coefficients. 
The resulting functional metamodel is a fast emulator (i.e. with negligible cpu time) of the 
computer code. It can be used for uncertainty propagation issues, optimization problems 
and, as advocated in this paper, for sensitivity index estimation. 

An analytical test function was presented to explain the different steps, criteria and 
modeling choices of our methodology. The convergence of our Gp-based functional meta- 
model was also investigated. Then, our methodology was applied to a real case to stress 
its concrete applicability. We particularly emphasized the relevance of the additional in- 
formation (in addition to the expert and model knowledge) brought by the spatial maps 
of first order and total sensitivity indices. These sensitivity maps allow us for spatially 
identifying the most influential inputs, for detecting zones with input interactions and for 
determining the zone of influence for each input. 

Our methodology can be extended to any computer codes with functional outputs: 
codes with outputs depending on time, codes depending on other physical processes 
(such as a function of temperature), codes with outputs varying in space and time. In 
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the third case, the temporal and the spatial scales must be carefully distinguished. It 
would be interesting in a future work to apply our method to the MARTHE spatio- 
temporal evolutions of the concentration values (between 2002 and 2010). In addition, 
i mprovements could be proposed. For example, the vagu elette- wavelet decomposition 



(lAbramovich and Silverman 



1997 



Ruiz-Medina et al. 



20071 ) would be an interesting sub- 



stitute to the wavelet decomposition. It would allow a simultaneous treatment of all 
the spatial output maps and a direct standardization of all decomposition coefficients. 
Last, dealing with the functional input case remains an important and challenging issue 
to disseminate the globa l sen sitivity analysis into env i ronme ntal modeling communities. 



Iooss and Ribatet 



(J2009J) and 



Lilburne and Tarantolal (120091 ) proposed some preliminary 



methodologies to account for the spatially distributed inputs when computing Sobol' in- 
dices. 
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APPENDIX A: SOBOL INDICES FOR THE CAMP- 
BELL2D FUNCTION 

The analytical derivations of the first order Sobol' indices Si (Eq. (CQ)) of the Campbell2D 
function ([6]) consists, first of all, in obtaining analytical expressions of the conditional 
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expectations E (Y\Xi) (for i — 1, . . . , 8). The multiple integrations are made following the 
uniform distribution on [—1, 5] (we have E(JQ) = 2 and Var(Xj) = 3 V i = 1, . . . , 8). The 
terms of these integrals which do not depend on X-i can be directly put to zero (because 
these terms disappear when the variance over Xi is taken). In the next step, we take the 
variance over Xi of the expressions of the conditional expectations (which leads to simple 
integrals). In some cases, analytical simplifications can be made but in other cases, these 
variances cannot be simplified and the integrals are evaluated by Monte Carlo. 
We recall that (z 1 ,z 2 ) £ [— 90, 90] 2 and we define the following variable changes: 

e 1 = 0.8^1 + 0.2z 2 , 6> 2 = 0.5^1 + 0.5,22 , 0i = 0.4,2i + 0.6z 2 , <p 2 = 0.3^ + 0.7,22 • (14) 



The Campbell2D function is now written 



g(X,z 1 ,z 2 ) = Xiexp 
+X 5 (X 5 -2)exp 



(9 1 -10X 2 ) 2 

QOXf 
'(0! - 20X 6 ) 2 

40X| 



+ (X 2 + X 4 ) exp 
+ (X 6 + X 8 ) exp 



9 2 Xi 

500 
> 2 X 7 



250 



(15) 



We also define $(z) as the cumulative distribution function of a standardized Gaussian 
variable. The first order Sobol' indices for the 8 input variables are written: 



S 1 (z u z 2 ) = X a r^l-^X 2 1 



TV 



5O-0i 



30X ± 



10 + 01 



30Xi 



+ 4 exp 



9 2 Xi 
500 



(16) 
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